VMdeploy: Improving Best-Effort Job Management in Grid’5000

نویسندگان

  • Jérôme Gallard
  • Adrien Lèbre
  • Christine Morin
چکیده

Virtualization technologies have recently gained a lot of interest in Grid computing as they allow exible resource management. Grid'5000 (G5K) is a French national Grid platform used for computer science research to experiment all layers of Grid software. Computer scientists reserve G5K nodes prior to their experiments. In G5K some low priority jobs are executed in best e ort mode on the node idle time slots when the latter are not part of any reservation. However, best-e ort jobs may be killed at any time by the Grid job scheduler when the nodes they use are subject to higher priority reservation. This behaviour potentially leads to a huge waste of compute time or at least requires users to deal with checkpoints of their best-e ort jobs. In this paper, we describe the design and implementation of the VMdeploy framework, which exploits virtual machines for executing best e ort jobs in order to solve the best-e ort issue in the G5K platform. VMdeploy manages snapshots of the best e ort jobs transparently to their users and thus ensures the progress of these jobs avoiding most of the waste of resources. Results of a preliminary experimental evaluation are presented. While designed in the context of G5K, VMdeploy can be used in combination of any job scheduler in clusters and grids. Key-words: Virtualization, Grid, Best E ort jobs, Scheduling, Resource Management. ∗ The INRIA team carries out this research work in the framework of the XtreemOS project partially funded by the European Commission under contract #FP6-033576. † INRIA Rennes Bretagne Atlantique, Rennes France [email protected] ‡ EMN, France [email protected] in ria -0 03 46 74 0, v er si on 3 6 Ju l 2 00 9 VMdeploy: Comment améliorer la gestion des travaux de type "best-e ort" dans Grid'5000. Résumé : Les technologies de virtualisation ont récemment eu un gain d'intérêt dans le domaine de la Grille et cela est dû principalement au fait qu'elles permettent une plus grande exibilité dans la gestion des ressources. Grid'5000 (G5K) est une Grille nationale Française utilisée pour des expérimentations scienti ques à grande échelle. Pour pouvoir réaliser leurs expérimentations, les utilisateurs de G5K doivent réserver leurs n÷uds. Dans G5K, des travaux de faible priorité ("best-e ort" faire au mieux) sont exécutés sur des n÷uds disponibles, c'est-à-dire, ne faisant parti d'aucune réservation. Ces travaux de type "best-e ort" peuvent être retirés des n÷uds à tout moment par l'ordonnanceur de la Grille quand des travaux de priorité supérieure sont soumis. Ce comportement conduit potentiellement à des pertes de temps de calcul, ou, contraint les utilisateurs à mettre en place des méthodes de sauvegarde/restauration de point de reprise de leurs travaux "best-e ort". Dans ce document, nous décrivons l'architecture ainsi que l'implémentation de VMdeploy, notre prototype. VMdeploy exploite les fonctionnalités des machines virtuelles (VM) pour exécuter des travaux de type "best-e ort" a n d'optimiser leur gestion. VMdeploy gère la création ainsi que le déplacement et la suspension/redémarrage des VM de manière transparente pour les utilisateurs a n de réduire au mieux la perte de temps de calcul. Les premières expérimentations que nous présentons se montrent concluantes. En n, bien qu'il ait été conçu dans le contexte de G5K, VMdeploy peut être utilisé avec n'importe quel ordonnanceur de grappe ou grille. Mots-clés : Virtualisation, Grille, Travaux de type "best-e ort", Ordonnancement, Gestion des ressources. in ria -0 03 46 74 0, v er si on 3 6 Ju l 2 00 9 VMdeploy: Improving Best-E ort Job Management in Grid'5000 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Integrated Processor Allocation and Job Scheduling Approach to Workload Management on Computing Grid

Processor allocation and job scheduling are two complementary techniques for improving the performance of parallel systems. This paper presents an effort in studying the issues of processor allocation and job scheduling on the emerging computing grid platform and developing an integrated approach to efficient workload management. The experimental results indicate that through careful design of ...

متن کامل

Job Scheduling with Resource Reservation and Prediction Mechanisms

Grids link together computers, data, sensors, large scale scientific instruments, visualization systems, networks and people. They can provide very large pools of computer resources, enable distributed collaborations and deliver increased efficiency and on-demand computing capabilities. The complexity of Grids on one hand and the requirements towards performance and capability on the other hand...

متن کامل

Improving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner

Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...

متن کامل

Improving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner

Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...

متن کامل

Sla-Based Job Submission and Scheduling with the Globus Toolkit 4

High performance computing is nowadays mostly performed in a best effort fashion. This is surprising as the closely related topic of grid computing, which deals with the federation of resources from multiple domains in order to support large jobs, and cloud computing, which promises seemingly infinite amounts of compute and storage, both offer quality of service (QoS), albeit in different ways....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008